#visual instruction tuning03/06/2025
LLaDA-V: Revolutionizing Multimodal AI with Purely Diffusion-Based Language Modeling
LLaDA-V introduces a purely diffusion-based approach to multimodal large language modeling, achieving impressive results in visual instruction tuning and reasoning across diverse tasks.